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Abstract 

Speculative execution is a method to increase in- 
struction level parallelism which can be exploited by 
both super-scalar and VLIW architectures . The key 
to a successful general speculation strategy is a repair 
mechanism to handle mispredicted branches and ac- 
curate reporting of exceptions for speculated instruc- 
tions. Multiple instruction rollback is a technique 
developed for recovery from transient processor fail- 
ures. Many of the difficulties encountered during re- 
covery from branch misprediction or from instruction 
re-execution due to exceptions in a speculative exe- 
cution architecture are similar to those encountered 
during multiple instruction rollback. 

This paper investigates the applicability of a re- 
cently developed compiler-assisted multiple instruc- 
tion rollback scheme to aid in speculative execution 
repair. Extensions to the compiler- assisted scheme 
to support branch and exception repair are presented 
along with performance measurements across ten ap- 
plication programs. 

1 Introduction 

Super-scalar and VLIW architectures have been 
shown effective in exploiting instruction level paral- 
lelism (ILP) present in a given application [1-3]. Cre- 
ating additional ILP in applications has been the sub- 
ject of study in recent years [4-6]. Code motion within 
a basic block is insufficient to unlock the full potential 
of super-scalar and VLIW processors with issue rates 
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greater than two [3]. Given a trace of the most fre- 
quently executed basic blocks, limited code movement 
across block boundaries can create additional ILP at 
the expense of requiring complex compensation code 
to ensure program correctness [7]. Combining multiple 
basic blocks into superblocks permits code movement 
within the superblock without the compensation code 
required in standard trace scheduling [3]. 

General upward and downward code movement 
across trace entry points (joins) and general down- 
ward code motion across trace exit points (branches, 
or forks) is permitted without the need for special 
hardware support [7]. Sophisticated hardware support 
is required, however, for unrestricted upward code mo- 
tion across a branch boundary. Such code motion 
is referred to as speculative execution and has been 
shown to substantially enhance performance over non- 
speculated architectures [8-10]. This paper focuses on 
the support hardware for speculative execution, which 
ensures correct operation in the presence of except- 
ing speculated instructions (referred to as exception 
repair) and of mispredicted branches (referred to as 
branch repair). It is shown that data hazards which re- 
sult from exception and branch repair are very similar 
to data hazards that result from multiple instruction 
rollback, and that techniques used to resolve rollback 
data hazards are applicable to exception and branch 
repair. 

The remainder of the paper is organized as follows. 
Section 2 gives a brief overview of a compiler- assisted 
multiple instruction rollback (MIR) scheme to be used 
as a base for application to speculative execution re- 
pair (SER). Section 3 describes speculative execution 
and the requirements for exception repair and branch 
repair. Section 4 introduces a schedule reconstruc- 
tion scheme and extends the compiler- assisted rollback 
scheme. Section 5 describes read buffer flush costs and 
Section 6 presents performance impacts which result 



from read buffer flushes. 


2 Compiler- Assisted Multiple Instruc- 
tion Rollback Recovery 

2.1 Hazard Classification 


Within a general error model, data hazards result- 
ing from instruction retry are of two types [11—13]. 
On-path hazards are those encountered when the in- 
struction path after rollback is the same as the initial 
path and branch hazards are those encountered when 
the instruction path after rollback is different than the 
initial path. As shown in Figure 1, r, represents an 
on-path hazard where during the initial instruction se- 
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Figure 1: On-path data hazard. 


f — L 


N 


r y is live 


rollback***-. 


\r » r . + r 
v d € 




error 

detected 


Figure 2: Branch data hazard. 



quence r* is written and after rollback is read prior to 
being re-written. As shown in Figure 2, r y represents 
a branch hazard where the initial instruction sequence 
writes r y and after rollback r y is read prior to being re- 
written however this time not along the original path. 

2.2 On-path Hazard Resolution Using a 
Read Buffer 

Hardware support consisting of a read buffer of size 
2 N t as shown in Figure 3, has been shown to be ef- 
fective in resolving on-path hazards [11-13]. The read 
buffer maintains a window of register read history. If 
an on-path hazard is present, then prior to writing 
over the old value of the hazard register, a read of 
that value must have taken place within the last N 
instructions (else after rollback of < N } a read of the 
hazard register would not occur before a redefinition). 
Key to this scenario is the fact that the original path 
is repeated. Branch hazard resolution is left to the 


Figure 3: Read buffer. 

compiler. At rollback, the read buffer is flushed back 
to the general purpose register file (GPRF), restoring 
the register file to a restartable state. The primary 
advantage of the read buffer is that it does not require 
an additional read port as with a history buffer, repli- 
cation of the GPRF as with the future file, or bypass 
logic as with the reorder buffer or delayed write buffer 

[14,15]. 

2.3 Branch Hazard Removal Compiler 
Transformations 

Compiler transformations have been shown to be 
effective in resolving branch hazards [11,12]. Branch 
hazard resolution occurs at three levels; 1) pseudo 
code, 2) machine code, and 3) post-pass. Resolution 
at the pseudo code level would be accomplished by 
renaming the pseudo register r y of instruction /,* (Fig- 




ure 2) to r,. Node splitting, loop expansion and loop 
protection transformations aid in breaking pseudo reg- 
ister equivalence relationships so that renaming can 
be performed. After the pseudo registers are mapped 
to physical registers, some branch hazards could re- 
appear. This is prevented at the machine code level 
by adding hazard constraints to live range constraints 
prior to register allocation. Branch hazards that re- 
main after the first two levels can be resolved by either 
creating a “covering” on-path hazard or by inserting 
nop (no operation) instructions ahead of the hazard 
instruction until the rollback is guaranteed to be un- 
der the branch. Given the branch hazard of Figure 
2, a covering on-path hazard is created by inserting 
an MOV r y ,r y instruction immediately before the in- 
struction in which r y is defined. This guarantees that 
the old value of r y is loaded into the read buffer and 
is available to restore the register file during rollback. 
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Figure 5: Speculated instruction traps. 

3.1 Branch Repair 


3 Speculative Execution 

Figures 4 and 5 illustrate the two basic problems 
which are encountered when attempting upward code 
motion across a branch. As shown in Figure 4, if the 
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Figure 4: r\ in live_out of taken path. 

speculated instruction (i.e., an instruction moved up- 
ward past one or more branches) modifies the system 
9 tate, and due to the branch outcome the speculated 
instruction should not have been executed, program 
correctness could be affected. Figure 5 illustrates that 
if the speculated instruction causes an exception, and 
again due to the branch outcome, the excepting in- 
struction should not have been executed, program per- 
formance or even program correctness could be af- 
fected. 


Figure 6 shows an original instruction schedule and 
a new schedule after speculation. Instructions d, », 
and / have been speculated above branches c and 
g from their respective fall-through paths. 2 Specu- 
lated instructions are marked “(s).” The motivation 
for such a schedule might be to hide the load delay 
of the speculated instructions or to allow more time 
for the operands of the branch instructions to become 
available. If c commits to the taken path (i.e., it is 
mispredicted by the static scheduler), some changes 
to the system state that have resulted from the execu- 
tion of d, i, and /, may have to be undone. No update 
is required for the PC; execution simply begins at j. 
If instead, c commits to the fall-through path but g 
commits to the taken path, then only i ’ s changes to 
the system state may have to be undone. 

Not all changes to the system state are equally im- 
portant. If for example, d writes to register r r and 
r* £ liveJn(j) (i.e., along the path starting at j, a 
redefinition of r, will be encountered prior to a use of 
r x [16]), then the original value of r x does not have 
to be restored. Inconsistencies to the system state 
as a result of mispredicted branches exhibit similari- 
ties to branch hazards in multiple instruction rollback 
[11,12]. Given this similarity between branch haz- 
ards due to instruction rollback and branch hazards 
due to speculative execution, compiler-driven data- 
flow manipulations, similar to those developed to elim- 
inate branch hazards for MIR [11, 12], can be used to 
resolve branch hazards that result from speculation. 
Such compiler transformations have been proposed for 

3 For this example it is assumed that the fall- through paths 
are the most likely outcome of the branch decisions at c and g. 
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repair. 


branch misprediction handling [9]. Since re-execution 
of speculated instructions is not required for branch 
misprediction, compiler resolution of branch hazards 
becomes a sufficient branch repair technique. 

3.2 Exception Repair 

Figure 6 also demonstrates the handling of spec- 
ulated trapping instructions. If d is a trapping in- 
struction and an exception occurred during its execu- 
tion, handling of the exception must be delayed until c 
commits so that changes to the system state are mini- 
mized, and in some cases to ensure that repair is pos- 
sible in the event that c is mispredicted. If c commits 
to the taken path, the exception is ignored and d is 
handled like any other speculated instruction given a 
branch mispredict. If c was correctly predicted, three 
exception repair strategies are possible. The first is to 
undo the effects of only those instructions speculated 
above c (i.e., d, i, and /) and then branch to a recov- 
ery block RB-c [10] as shown in Figure 6 . The address 
of the recovery block cam be obtained by using the PC 
value of the excepting instruction as an index into a 
hash table. This strategy ensures precise interrupts 
[14, 17] relative to the nonspeculated schedule but not 
relative to the original schedule. Recovery blocks cam 
cause significant code growth [10]. The second strat- 
egy undoes the effects of all instructions subsequent to 
d (i.e., i, b y amd /), handles the exception, and resumes 
execution at instruction i [9]. This latter strategy pro- 
vides restartable states and does not require recovery 
blocks. A third exception repair strategy undoes the 
effects of only those subsequent instructions that are 
speculated above c (i.e., only i amd /), handles the ex- 


ception, and resumes execution at instruction *, how- 
ever, this time only executing speculated instructions 
until c is reached. The improved efficiency of strategy 
3 over that of strategy 2 comes at the cost of slightly 
more complex exception repair hardware. 

When a branch commits amd is mispredicted, the 
exception repair hardware must perform three func- 
tions: 1 ) determine whether an exception has occurred 
during the execution of a speculated instruction, 2 ) if 
an exception has occurred, determine the PC value 
of the excepting instruction, amd 3) determine which 
changes to the system state must be undone. Func- 
tions 1 and 2 are similar to error detection and location 
in multiple instruction rollback. Function 3 is similar 
to on-path hazard resolution in multiple instruction 
rollback [11,12,18]. On-path hazards assume that af- 
ter rollback the initial instruction sequence from the 
faulty instruction to the instruction where the error 
was detected is repeated. 

Figure 7 illustrates the speculation of a group of 
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Figure 7: Exception repair. 

instructions and re-execution strategy 3. The load in- 
struction traps, but the exception is not handled un- 
til the branch instruction commits to the fall-through 
path. Control is then returned to the trapping instruc- 
tion. This scenario is identical to multiple instruction 
rollback where an error occurs during the load instruc- 
tion and is detected during the branch instruction. For 
this example, only r\ must be restored during rollback 
since 7*4 and r$ will be rewritten prior to use during 
re-execution. Figure 7 shows that exception repair 




hazards in speculative execution are the same as on- 
path hazards in multiple instruction rollback, and a 
read buffer as described in Section 2 can be used to 
resolve these hazards. The depth of the read buffer is 
the maximum distance from /j to 7 n along any back- 
wards walk 3 , where 7 n is a trapping instruction that 
was speculated above branch instruction I 

3.3 Schedule Reconstruction 

Assumed in Figures 6 and 7 are mechanisms to 
identify speculative instructions, determine the PC 
value of excepting speculated instructions, and deter- 
mine how many branches a given instruction has been 
speculated above. An example of the latter case is 
shown in Figure 6 where instructions d, i, and /, are 
undone if e is mispredicted; however, only i must be 
undone if g is mispredicted. 

If the hardware had access to the original code 
schedule, the design of these mechanisms would be 
straightforward. Unfortunately, static scheduling re- 
orders instructions at compile-time and information as 
to the original code schedule is lost. To enable recov- 
ery from mispredicted branches and proper handling 
of speculated exceptions, some information relative to 
the original instruction order must be present in the 
compiler-emitted instructions. This will be referred to 
as schedule reconstruction . 

By limiting the flexibility of the scheduler, less in- 
formation about the original schedule is required. For 
example, if speculation is limited to one level only 
(i.e., above a single branch), a single bit in the opcode 
field is sufficient to indicate that the instruction has 
been moved above the next branch [8]. The hardware 
would then know exactly which instruction effects to 
undo (i.e., the ones with this bit set). Also, remov- 
ing branch hazards directly with the compiler permits 
general speculation with no schedule reconstruction 
for branch repair [9]. 

4 Implicit Index Schedule Reconstruc- 
tion 

Implicit index scheduling supports general specula- 
tion of regular and trapping instructions. The scheme 
was inspired by the handling of stores in the sentinel 
scheduling scheme [9] and was designed to exploit the 
unique properties of the read buffer hardware design 
described in Section 2. Schedule reconstruction is ac- 
complished by marking each instruction speculated or 

3 A walk is a sequence of edge traversals in a graph where the 
edges visited can be repeated [19]. 


nonspeculated by including a bit in the opcode field, 
and using this encoding to maintain an operand his- 
tory of speculated instructions in a FIFO queue called 
a speculation reawl buffer (SRB). The SRB operates 
similar to a read buffer with additional provisions for 
exception handling. 

4.1 Exception Repair Using a Speculation 
Read Buffer 

Figure 8 shows an original code schedule and two 
speculative schedules, along with the contents of the 
SRB at the time branches I c and I 9 commit. Instruc- 
tions Is and If have been speculated above branch 
instruction 7 C , and U has been speculated above both 
I 9 and I c . The encoding of speculated instructions in- 
forms the hardware that the source operands are to 
be saved in the SRB, along with the source operand 
values, corresponding register addresses, and the PC 
of the speculated instruction. 

Speculated instructions execute normally unless 
they trap. If a speculated instruction traps, the ex- 
ception bit in the SRB which corresponds to the trap- 
ping instruction is set and program execution contin- 
ues. Subsequent instructions that use the result of the 
trapping instruction are allowed to execute normally. 

A chk-exctpt(k) instruction is placed in the home 
block of each speculated instruction. Only one 
chk-except(k) instruction is required for a home block. 
As the name implies, chk.except(k) checks for pend- 
ing exceptions. The command can simultaneously in- 
terrogate each location in the SRB by utilizing the 
bit field i. As shown in schedule 1 of Figure 8, 
chk-except(OOllll) in 7' checks exceptions for instruc- 
tions Is and 7*. If a checked exception bit is set, the 
SRB is flushed in reverse order, restoring the appropri- 
ate register and PC values. Execution can then begin 
with the excepting instruction. 

Figure 8 illustrates several on-path hazards which 
are resolved by the SRB. In schedule 1, if U traps and 
the branch I c commits to the taken path, 7, has cor- 
rupted ra and If has corrupted 7*7. Flushing the SRB 
up through Ii restores both registers to their values 
prior to the initial execution of 7j . Note that register 
re is also corrupted but not restored by the SRB, since 
after rollback re will be rewritten with a correct value 
before the corrupted value is used. 

As an alternative to checking for exceptions in each 
home block, the exception could be handled when the 
exception bit reaches the bottom of the SRB. This is 
similar to the reorder buffer used in dynamic schedul- 
ing [14] and eliminates the cost of the chk-eicept(k) 
command, however, increases the exception handling 
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Figure 8: Exception repair using a speculation read buffer (SRB). 




latency which can impact performance depending on 
the frequency of exceptions. 

Implicit index scheduling derives its name from the 
ability of the compiler to locate a particular register 
value within the SRB. This is possible only if the dy- 
namically occurring history of speculated instructions 
is deterministic at branch boundaries. Superblocks 
guarantee this by ensuring that the sole entry into the 
superblock is at the header and by limiting specula- 
tion to within the superblock. For standard blocks, 
bookkeeping code [7] can be used to ensure this deter- 
ministic behavior. 

4.2 Branch Repair Using a Speculation 
Read Buffer 

As described in Section 2, branch repair can be han- 
dled by resolving branch hazards with the compiler. 
Branch hazard resolution in multiple instruction roll- 
back can be assisted by the read buffer when cover- 
ing on-path hazards are present, reducing the perfor- 
mance cost of variable renaming [11, 12]. In a similar 
fashion, the SRB can assist in branch repair. Figure 
9 shows the original code schedule and the two spec- 
ulative schedules of Figure 8. For this example, it is 
assumed that 1*2, r 3, rg, and P7 are elements in both 
liveJn(Ij) and livc-in(Ik). 

As shown in schedule 1, if branch instruction I c 
commits to the taken path, rj, r$, and rr, which were 
modified in /*, and //, respectively, must be re- 
stored. If instead, I c commits to the fall-through path 
and I g commits to the taken path, only rj must be re- 
stored. Registers r 2 and ry are rollback hazards that 
result from exception repair; therefore, the SRB con- 
tains their unmodified values. By including a flushfk) 
command at the target of I c and the SRB can be 
used to restore and/or rj given a misprediction of 
h or 

The flush (k) command selectively flushes the ap- 
propriate register values given a branch misprediction. 
For example, in schedule 2 of Figure 9, if I e is predicted 
correctly and I 9 is mispredicted, the SRB is flushed in 
reverse order up through restoring valuefa) from 
Ii but not restoring value{n) from //. Since specu- 
lation is always from the most probable branch path, 
the flush(k) command is always placed on the most 
improbable branch path, minimizing the performance 
penalty. Not all branch hazards are resolved by the 
presence of on-path hazards. These remaining haz- 
ards can be resolved with compiler transformations. 


5 SRB Flush Penalty 

The examples of Section 4 demonstrate that 
compiler-assisted multiple instruction rollback can be 
applied to both branch repair and exception repair in a 
speculative execution architecture. The flush penalty 
of the read buffer is not a key concern in multiple in- 
struction rollback applications since instruction faults 
are typically very rare. In application to exception re- 
pair in speculative execution, the SRB flush penalty is 
also not a major concern due to the infrequency of ex- 
ceptions involving speculated instructions. However, 
in application to branch repair, the SRB flush penalty 
could produce significant performance impacts. Stud- 
ies of branch behavior show a conditional branch fre- 
quency of 11% to 17% [20]. Static branch prediction 
methods result in branch mispredictions in the range 
of 5% to 15%. This results in a branch repair fre- 
quency as high as 2.5%. Assuming a CPI (clock cycles 
per instruction) rate of one and an average SRB flush 
penalty of ten cycles, the performance overhead of the 
flush mechanism would reach 22.5%. This indicates 
the importance of minimizing the amount of redun- 
dant data stored in the SRB so that the flush penalty 
is reduced. 

Recently, a technique was proposed to reduce the 
amount of redundant data in a read buffer so that the 
read buffer size could be reduced [12, 13]. A similar 
technique can be used to assure that only the data 
required for branch and exception repair is stored in 
the SRB. In the implicit index scheme of Section 4, a 
bit indicating whether an instruction is speculated is 
added to the opcode field. By expanded this field to 
two bits, operand storage requirements can be spec- 
ified. Figure 10 shows the reduced contents of the 
SRB given schedule 1 of Figure 9. In the modified 
scheme, only the first read of r 7 must be maintained. 
Register r% is not required since it was not modified. 
The improved scheme also eliminates blank spaces in 
the SRB. For this example, the misprediction of I c in 
schedule 1 of Figure 9 results in four less variables to 
flush. 

The coding of the two speculation bits would be as 
follows: 00) no save required, 01) save operand 1, 10) 
save operand 2, and 11) save both operands. If neither 
operand of a speculated instruction has be saved in 
the SRB, the instruction is not marked as speculated. 
This is not a problem for branch repair: however, if 
such an instruction traps, the hardware would have no 
way of knowing not to handle the exception immedi- 
ately. There would also be no entry in the SRB for the 
exception bit or for the corresponding PC value. One 
solution to the problem would be to add another bit to 
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Figure 9: Branch repair using a speculation read buffer (SRB). 
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Figure 11: Instrumentation code placement. 


the opcode field which marks speculated trapping in- 
structions. A better solution is to code all speculated 
trapping instructions which have no operands to save 
as 01. This will indicate that exception handling is to 
be delayed and cause a reservation of an entry in the 
SRB, and also will slightly increase the flush penalty 
during branch repairs. 

6 Performance Evaluation 
6*1 Evaluation Methodology 

In this section, results of a read buffer flush penalty 
evaluation are presented. The instrumentation code 
segments of Figure 11 call a branch error procedure 
which performs the following functions: 

1, Update the read buffer model. 


2. Force actual branch errors during program exe- 
cution, allowing execution to proceed along an 
incorrect path for a controlled number of instruc- 
tions. 

3. Terminate execution along the incorrect path and 
restore the required system state from the simu- 
lated read buffer. 

4. Measure the resulting flush cycles during the 
branch repair. 

5. Begin execution along the correct path until the 
next branch is encountered. 

An example instrumentation code segment is shown 
in Figure 12. Parameters, such as operand saving in- 
formation, current PC, branch fall-though PC, and 
branch target PC values, are passed by the instru- 
mentation code to the branch error procedure. An 
additional miscellaneous parameter contains instruc- 
tion type and information used for debugging. 

Figure 13 gives a high level flow of operation for the 
branch error procedure. When a branch instruction 
in the original application program is encountered, an 
armlbranch flag is set. Prior to the execution of the 
next application instruction, the arm-branch flag is 
checked, and if set, the branch decision made by the 
application program is set aside. The branch is then 
predicted by the branch prediction model. Four mod- 
els are used in the evaluation: 1) predict taken, 2) pre- 
dict not taken, 3) dynamic prediction, and 4) static 
prediction from profiling information. The dynamic 
prediction model is derived from a two bit counter 
branch target buffer (BTB) design [21] and is the 
only model that requires updating with each predic- 
tion outcome. 

After the branch is predicted, the prediction is 
checked against the actual branch path taken by the 
application program. If the prediction was correct, ex- 
ecution proceeds normally. If the prediction was incor- 
rect, the correct branch path is loaded into the recov- 
ery queue along with a branch error detection (BED) 
latency, and the predicted path is loaded into the PC. 
The BED latency indicates how long the execution of 
instructions is to continue along the incorrect path. 
The branch error time-out flag is set when the BED 
latency is reached. When a branch error is detected, 
the register file state is repaired using the read buffer 
contents. The PC value of the correct branch path is 
obtained from the recovery queue. During branch er- 
ror rollback recovery, the number of cycles required to 
flush the read buffer during branch repair is recorded. 
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Figure 12: Instrumentation code sequences. 





PC - program counter 

GPRF - general purpose register file 
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Figure 13: Branch error procedure operation. 









Table 1: Application programs. 


Program 

Static Size 

Description 

QUEEN 

148 

eight-queen program 

WC 

181 

UNIX utility 

QSOKT 

252 

quick sort algorithm 

CMP 

262 

UNIX utility 

GREP 

907 

UNIX utility 

PUZZLE 

932 

simple game 

COMPRESS 

1826 

UNIX utility 

LEX 

6856 

lexical analyzer 

YACC 

8099 

parser-generator 

CCCP 

8775 

preprocessor for 
gnu C compiler 


It is assumed for this evaluation that two read 
buffer entries can be flushed in a single cycle. This cor- 
responds to a split-cycle-save assumption of the gen- 
eral purpose register file [12]. Performance overhead 
due to read buffer flushes (% increase) is computed as 


FlushJOH = 100 * 


flush-cycles 

total-cycles 


All instructions are assumed to require one cycle for 
execution. This assumption is conservative since the 
MIPS processor used for the evaluation requires two 
cycles for a load. The additional cycles would increase 
the total-cycles and thereby reduce the observed per- 
formance overhead. In addition to accurately measur- 
ing flush costs, the evaluation verifies the operation of 
the read buffer and its ability to restore the appropri- 
ate system state over a wide range of applications. 

The instrumentation insertion transformation oper- 
ates on the s-code emitted by the MIPS code generator 
of the IMPACT C compiler [3]. The transformation 
determines which operands require saving in the read 
buffer and inserts cadis to the initialization, branch er- 
ror, and summary procedures. The resulting s-code 
modules are then compiled and run on a DECstation 
3100. For the evaluation, BED latencies from 1 to 10 
were used. Table 1 lists the ten application programs 
evaluated. Static Size is the number of assembly in- 
structions emitted by the code generator, not includ- 
ing the library routines and other fixed overhead. 


6.2 Evaluation Results 


Experimental measurements of read buffer flush 
overhead (Flush OH) for various BED latencies are 
shown in Figures 14 through 23. The four branch 
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Figure 14: Flush penalty: QUEEN. 
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Figure 15: Flush penalty: WC. 


prediction strategies used for the evaluation are: 
1) predict taken (P -Taken), 2) predict not taken 
(P-N-Taken), 3) dynamic prediction based on a 
branch target buffer ( Dyn-Pred ), and 4) static branch 
prediction using profiling data ( Prof-Pred ). 

Flush costs were closely related to branch predic- 
tion accuracies, i.e., the more often a branch was mis- 
predicted, the more often flush costs were incurred. 
In a speculative execution architecture, branch predic- 
tion inaccuracies result in performance impacts in ad- 
dition to the impacts from the branch repair scheme. 
Branch misprediction increases the base run time of 
an application by permitting speculative execution of 
unproductive instructions. Increased levels of specula- 
tion increase the performance impacts associated with 
branch prediction inaccuracies. Only the performance 
impacts associated read buffer flushes are shown in 
Figures 14 through 23. 
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Figure 16: Flush penalty: COMPRESS. 
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Figure 17: Flush penalty: CMP. 

For nine of the ten applications, P-N. Taken was 
significantly more accurate or marginally more ac- 
curate in predicting branch outcomes than P-Taken. 
For QSORT, P-Taken was significantly more accurate 
than P-N-Taken. This result demonstrates that in 
a speculative execution architecture, it is difficult to 
guarantee optimal performance across a range of ap- 
plications given a choice between predict-taken and 
predict-not-taken branch prediction strategies. 

For all but one application, Prof-Prtd was more ac- 
curate than either P-Taken or P-N-Taken. For CMP, 
Prof-Prtd, P-N-Taken, and Dyn-Prtd were nearly per- 
fect in their prediction of branch outcomes. Prof-Prtd 
marginally outperformed Dyn-Prtd in all applications 
except LEX. 

The purpose of measuring read buffer flush costs 
given the recovery from injected branch errors is to 
establish the viability of using a read buffer design 
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Figure 18: Flush penalty: PUZZLE. 
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Figure 19: Flush penalty: QSORT. 

for branch repair for speculative execution. Although 
in such a speculative schedule only static prediction 
strategies would be applicable, the Dyn-Pred model 
was included to better assess how varying branch pre- 
diction strategies impact flush costs. Overall, the ac- 
curacy of Dyn-Pred fell between P. Taken/ P-N-Taken 
and Prof-Prtd. 

Over the ten applications studied, read buffer flush 
overhead ranged from 49.91% for the P-Taken strat- 
egy in CCCP to .01% for the P-N -Taken strategy for 
CMP given a BED of ten. It can be seen from Figures 
14 through 23 that a good branch prediction strat- 
egy is key to a low read buffer flush cost. The results 
show that given a static branch prediction strategy 
using profiling data, an average BED of ten produces 
flush costs no greater than 14.8% and an average flush 
cost of 8.1% across the ten applications studied. This 
performance overhead is comparable to the overhead 



Flush OH 

(%) 

50-{ p_Taken: 

P_N_Taken:--o- 
40-1 DynJPred: •->*••• 
^ Prof.Pred: 


30- 

20 - 

10 - 

0 



H—O 




,o--a' 

or' — “••• 


f — i — i — i — i — i — i — r 
123456789 10 
BED Latency 


Figure 20: Flush penalty: GREP. 
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Figure 21: Flush penalty: LEX. 
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Figure 22: Flush penalty: YACC. 
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Figure 23: Flush penalty: CCCP. 


expected from a delayed write buffer scheme with a 
maximum allowable BED of ten [15]. Given a max- 
imum BED of ten and an average BED of less than 
ten, the flush costs of the read buffer would be less 
than that of a delayed write buffer, since a delayed 
write buffer is designed for a worst-case BED and the 
flush penalty of a read buffer is based on the average 
BED. The observed flush costs are small in compari- 
son to the substantial performance gain of speculated 
architectures over that of nonspeculated architectures 
[ 8 - 10 ], 

The BED for a given branch in this evaluation cor- 
responds to the number of instructions moved above 
a branch in a speculative schedule. The results of the 
evaluation indicate that if the average number of in- 
structions speculated above a given branch is < 10, 
then the read buffer becomes a viable approach to 
handling branch repair. 


7 Summary 

Speculative execution has been shown to be an ef- 
fective method to create additional instruction level 
parallelism in general applications. Speculating in- 
structions above branches requires schemes to han- 
dle mispredicted branches and speculated instructions 
that trap. 

This paper showed that branch hazards resulting 
from branch mispredictions in speculative execution 
are similar to branch hazards in multiple instruction 
rollback developed for processor error recovery. It was 
shown that compiler techniques previously developed 
for error recovery can be used as an effective branch 
repair scheme in a speculative execution architecture. 
It was also shown that data hazards that result in 
rollback due to exception repair are similar to on-path 
hazards suggesting a read buffer approach to exception 



repair. 

Implicit index scheduling was introduced to exploit 
the unique characteristics of rollback recovery using 
a read buffer approach. The read buffer design was 
extended to include PC values to aid in rollback from 
excepting speculated instructions. 

Read buffer flush penalties were measured by in- 
jecting branch errors into ten target applications and 
measuring the flush cycles required to recover from 
the branch errors using a simulated read buffer. It 
was shown that with a static branch prediction strat- 
egy using profiling data, flush costs under 15% are 
achievable. The results of these evaluations indicate 
that compiler-assisted multiple instruction rollback is 
viable for branch and exception repair in a speculative 
execution architecture. 
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